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ADAPTIVE ESTIMATION FOR BIFURCATING MARKOV CHAINS 


S. VALERE BITSEKI PENDA, MARC HOFFMANN AND ADELAIDE OLIVIER 


Abstract. In a first part, we prove Bernstein-type deviation inequalities for bifurcating Markov 
chains (BMC) under a geometric ergodicity assumption, completing former results of Guyon 
and Bitseki Penda, Djellout and Guillin. These preliminary results are the key ingredient to 
implement nonparametric wavelet thresholding estimation procedures: in a second part, we 
construct nonparametric estimators of the transition density of a BMG, of its mean transition 
density and of the corresponding invariant density, and show smoothness adaptation over various 
multivariate Besov classes under L^-loss error, for 1 < p < oo. We prove that our estimators are 
(nearly) optimal in a minimax sense. As an application, we obtain new results for the estimation 
of the splitting size-dependent rate of growth-fragmentation models and we extend the statistical 
study of bifurcating autoregressive processes. 
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1. Introduction 

1.1. Bifurcating Markov chains. Bifurcating Markov Chains (BMC) are Markov chains in¬ 
dexed by a tree (Athreya and Kang [1], Benjamini and Peres [6], Takacs [39]) that are particularly 
well adapted to model and understand dependent data mechanisms involved in cell division. To 
that end, bifurcating autoregressive models (a specific class of BMC, also considered in the paper) 
were first introduced by Cowan and Staudte [16]. More recently Cuyon [28] systematically studied 
BMC in a general framework. In continuous time, BMC encode certain piecewise deterministic 
Markov processes on trees that serve as the stochastic realisation of growth-fragmentation models 
(see e.g. Doumic et al. [26], Robert et al. [38] for modelling cell division in Escherichia coli and 
the references therein). 

For TO > 0, let Cm = {0,1}™ (with Go = {0}) and introduce the infinite genealogical tree 

OO 

T = U G™. 

m—0 

For u G Gm, set jrtj = to and define the concatenation uO = (tt, 0) G G^-i-i and ul = {u, 1) G Gm-i-i. 
A bifurcating Markov chain is specified by 1) a measurable state space (§, ©) with a Markov 
kernel (later called T-transition) CP from (§, 6) to (§ x §, 6 0 G) and 2) a filtered probability space 
(G,T, (Tm)m>o,IP)- Following Cuyon, we have the 

Definition 1. A bifurcating Markov chain is a family (A„)„gT of random variables with value in 
(S, G) such that A„ is -measurable for every it G T and 

rr 9u{Xu, XuO, Xul)\3'm] = J]^ 1Pgu{Xu) 
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for every m > 0 and any family of (bounded) measurable functions {gu)u&G^! where yg{x) = 
Jg^g^g{x,y, z)‘J’{x,dy dz) denotes the action of 7 on g. 

The distribution of {Xu)ueT is thus entirely determined by IP and an initial distribution for X 0 . 
Informally, we may view as a population of individuals, cells or particles indexed by T 

and governed by the following dynamics: to each u G T we associate a trait Xy_ (its size, lifetime, 
growth rate, DNA content and so on) with value in S. At its time of death, the particle u gives rize 
to two children uO and ul. Conditional on = x, the trait {Xuo, X^i) G § x § of the offspring of 
u is distributed according to 7{x,dy dz). 


For n > 0, let T„ = Um=o denote the genealogical tree up to the n-th generation. Assume 
we observe X" = (AI„)ugT„, i-c. we have 2"’+^ — 1 random variables with value in §. There are 
several objects of interest that we may try to infer from the data X". Similarly to fragmentation 
processes (see e.g. Bertoin [9]) a key role for both asymptotic and non-asymptotic analysis of 
bifurcating Markov chains is played by the so-called tagged-branch chain, as shown by Guyon [28] 
and Bitseki Penda et al. [11]. The tagged-branch chain (Tm)m>o corresponds to a lineage picked 
at random in the population (X„)„gT: it is a Markov chain with value in § defined by Yq = Xq 
and for m > 1 , 

Ym = A^0ei...e„ ) 

where is a sequence of independent Bernoulli variables with parameter 1/2, independent 

of (Xu)ueT- It has transition 

Q = (IPo + 3^i)/2, 

obtained from the marginal transitions 


7o(x,dy)= / 7{x,dydz) and 7i(x,dz) = / 7{x,dydz) 

■J zGS J 

of IP. Guyon proves in [28] that if (Tm)m>o is ergodic with invariant measure v, then the convergence 

( 1 ) [ g{x)v{dx) 

' uGG„ 


holds almost-surely as n —>■ oo for appropriate test functions g. Moreover, we also have convergence 
results of the type 


( 2 ) 


1 


^ ^ 5(, AI^q, AI^i) y 

«GT„ 


7g{x)i'{dx) 


almost-surely as n —> oo. These results are appended with central limit theorems (Theorem 19 
of [28]) and Hoeffding-type deviations inequalities in a non-asymptotic setting (Theorem 2.11 and 
2.12 of Bitseki Penda et al. [11]). 


1.2. Objectives. The observation of X" enables us to identify v{dx) as n —>■ oo thanks to (1). 

Consequently, convergence (2) reveals IP and therefore Q is identified as well, at least asymptotically. 

The purpose of the present paper is at least threefold: 

1) Construct - under appropriate regularity conditions - estimators of v, Q and IP and study 
their rates of convergence as n —> oo under various loss functions. When S C M and when 
IP is absolutely continuous w.r.t. the Lebesgue measure, we estimate the corresponding 
density functions under various smoothness class assumptions and build smoothness adap¬ 
tive estimators, i.e. estimator that achieve an optimal rate of convergence without prior 
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knowledge of the smoothness class. 

2 ) Apply these constructions to investigate further specific classes of BMC. These include 
binary growth-fragmentation processes, where we subsequently estimate adaptively the 
splitting rate of a size-dependent model, thus extending previous results of Doumic et al. 
[26] and bifurcating autoregressive processes, where we complete previous studies of Bitseki 
Penda et al. [12] and Bitseki Penda and Olivier [13]. 

3) For the estimation of u, Q and T and the subsequent estimation results of 2), prove that 
our results are sharp in a minimax sense. 


Our smoothness adaptive estimators are based on wavelet thresholding for density estimation 
(Donoho et al. [24] in the generalised framework of Kerkyacharian and Picard [32]). Implementing 
these techniques requires concentration properties of empirical wavelet coefficients. To that end, 
we prove new deviation inequalities for bifurcating Markov chains that we develop independently 
in a more general setting, when § is not necessarily restricted to K. Note also that when Tq = Vi, 
we have Q = To = IPi as well and we retrieve the usual framework of nonparametric estimation 
of Markov chains when the observation is based on (Fi)i<i<ra solely. We are therefore in the line 
of combining and generalising the study of Clemengon [15] and Lacour [33, 34] that both consider 
adaptive estimation for Markov chains when § C K. 


1.3. Main results and organisation of the paper. In Section 2, we generalise the Hoeffding- 
type deviations inequalities of Bitseki Penda et al. [11] for BMC to Bernstein-type inequalities: 
when T is uniformly geometrically ergodic (Assumption 3 below), we prove in Theorem 5 deviations 
of the form 


and 



yg diy > < exp ^ 


k[G„|(5^ n 
Sn(5) + \g\ooS) 


V|T. 


IT I gi^ujXuo 


X,. 


iiGT„ 


yg dv > s'j < exp ^ 


Rn \ 

^nig) + |5|oo<5/’ 


where k,k > 0 only depend on T and ^n{g) is a variance term which depends on a combina¬ 
tion of the LJ’-norms of g for p = l, 2 ,oo w.r.t. a common dominating measure for the family 
{Q{x,dy),x G §}. The precise results are stated in Theorems 4 and 5. 

Section 3 is devoted to the statistical estimation of u, Q and CP when § C M and the family 
{y{x,dydz),x G §} is dominated by the Lebesgue measure on In that setting, abusing nota¬ 
tion slightly, we have v{dx) = v{x)dx, Q{x,dy) = Q(x,y)dy and y{x,dy dz) = y{x,y, z)dydz for 
some functions x ^ u(a:), {x, y) Q(a:, y) and (x, y, z) CP(x, y, z) that we reconstruct nonpara- 
metrically. Our estimators are constructed in several steps: 
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i) We approximate the functions v(x), foix, y) = v(x)Q.(x, y) and f'j>{x, y, z) = u{x)y{x, y, z) 
by atomic representations 

iy(x)^ 

AgVI(i^) 

fQ{x,y)^ {fQ,i^lHl{x,y), 

AGV2(/q) 

fy{x,y,z)K Y {h,ijlHl{x,y,z), 

where (•,•) denotes the usual L^-inner product (over for d = 1,2,3 respectively) and 
S V'^(-)) is a collection of functions (wavelets) in L^(IR'^) that are localised in time 
and frequency, indexed by a set V'^(-) that depends on the signal itself^. 

ii) We estimate 

by |T„|-i Y 
{Ml) by im-i ^ 

by |T„_i|~^ Y -ipl{Xu,X^o,Xui), 

uGT„_i 

where denotes the trait of the parent of u and T* = T„ \ Go, and specify a selection 
rule for V‘^(-) (with the dependence in the unknown function somehow replaced by an es¬ 
timator). The rule is dictated by hard thresholding over the estimation of the coefficients 
that are kept only if they exceed some noise level, tuned with |T„| and prior knowledge on 
the unknown function, as follows by standard density estimation by wavelet thresholding 
(Donoho et al. [25], Kerkyacharian and Picard [32]). 

iii) Denoting by r„(x), fn{x,y) and fn{x,y,z) the estimators of v{x), fci{x,y) and fy{x,y,z) 
respectively constructed in Step ii), we finally take as estimators for Q{x,y) and y{x,y,z) 
the quotient estimators 

Qn{x,y) = and (P„(a;,y,z) 

Vn[x) 

provided i'n{x) exceeds a minimal threshold. 

Beyond the inherent technical difficulties of the approximation Steps i) and iii), the crucial novel 
part is the estimation Step ii), where Theorems 4 and 5 are used to estimate precisely the prob¬ 
ability that the thresholding rule applied to the empirical wavelet coefficient is close in effect to 
thresholding the true coefficients. 

When Q or CP (identified with their densities w.r.t. appropriate dominating measures) belong 
to an isotropic Besov ball of smoothness s measured in over a domain in with s > d/ir 
and d = 1,2,3 respectively, we prove in Theorems 8, 9 and 10 that if Q is uniformly geometrically 


fn{x,y,z) 

Vn{x) 


^The precise meaning of the symbol ps and the properties of the 'ipx's are stated precisely in Section 3.1. 
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ergodic, then our estimators achieve the rate |T„| °‘d{s,p,TT) LP(2))-1oss, up to additional log |T„| 

terms, where 


ad{s,p,'7r) 


. f s s + d(l/p — I/tt) I 
l 2 s + d 2s + d{l — 2 /tt) / 


is the usual exponent for the minimax rate of estimation of a d-variate function with order of 
smoothness s measured in L'^ in L^-loss error. This rate is nearly optimal in a minimax sense for 
d = 1 , as follows from particular case Q(a;, dy) = v{dy) that boils down to density estimation with 
|T„| data: the optimality is then a direct consequence of Theorem 2 in Donoho et al. [25]. As for 
the case d = 2 and d = 3, the structure of BMC comes into play and we need to prove a specific 
optimality result, stated in Theorems 9 and 10. We rely on classical lower bound techniques for 
density estimation and Markov chains (Hoffmann [31], Clemengon [15], Lacour [33, 34]). 


We apply our generic results in Section 4 to two illustrative examples. We consider in Section 4.1 
the growth-fragmentation model as studied in Doumic et al. [26], where we estimate the size- 
dependent splitting rate of the model as a function of the invariant measure of an associated BMC 
in Theorem 11. This enables us to extend the recent results of Doumic et al. in several directions: 
adaptive estimation, extension of the smoothness classes and the loss functions considered, and 
also a proof of a minimax lower bound. In Section 4.2, we show how bifurcating autoregressive 
models (BAR) as developped for instance in de Saporta et al. [ 8 ] and Bitseki Penda and Olivier 
[13] are embedded into our generic framework of estimation. A numerical illustration highlights the 
feasibility of our procedure in practice and is presented in Section 4.3. The proofs are postponed 
to Section 5. 


2. Deviations inequalities for empirical means 


In the sequel, we fix a (measurable) subset DCS that will be later needed for statistical 
purposes. We need some regularity on the T-transition CP via its mean transition Q = ^(CPq + CPi). 

Assumption 2. The family {Q{x, dy), x € S} is dominated by a common sigma-finite measure 
n{dy). We have (abusing notation slightly) 

Q{x,dy) = Q{x,y)n{dy) for every x G S, 
for some Q : —>■ [0,oo) such that 

[QId = sup Q{x,y)<oo. 

An invariant probability measure for Q is a probability u on (§, ©) such that uQ = u where 
vQ{dy) = iy(dx)Q(x,dy). We set 

Q’'(x,dy) = f Q{x,dz)Q^~^{z,dy) with QP{x,dy) = 5x{dy) 

J 

for the r-th iteration of Q. For a function g : S‘^ ^ M. with d = 1, 2, 3 and 1 < p < oo, we denote 
by \g\p its LP-norm w.r.t. the measure allowing for the value \g\p = oo ii g ^ L^'(n®‘^). The 
same notation applies to a function g : D^ -G M tacitly considered as a function from K by 

setting g{x) = 0 for a; S S \ D. 

Assumption 3. The mean transition Q admits a unique invariant probability measure v and there 
exist R > 0 and 0 < p < 1/2 such that 

\Q'^g{x)— ( gdv\ < R\g\ao , x G S, to > 0 , 

Js 
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for every g integrable w.r.t. v. 

Assumption 3 is a uniform geometric ergodicity condition that can be verified in most appli¬ 
cations using the theory of Meyn and Tweedie [36]. The ergodicity rate should be small enough 
(p < 1/2) and this point is crucial for the proofs. However this is sometimes delicate to check in 
applications and we refer to Hairer and Mattingly [29] for an explicit control of the ergodicity rate. 


Our first result is a deviation inequality for empirical means over or T„. We need some 
notation. Let 

Ki =ki(Q,D) = 32max{lQlD,4lQl|,,4i?^(l + pf], 

K 2 =K 2 (Q) = ^ max {l -I- Rp, R{1 -bp)}, 

K3 =K3(Q, D) = 96max{lQli,, IGJQI^, 4i?^(l -b p)^(l - 2p)“^}, 

K 4 =K 4 (Q) = max {l -b Rp, R{1 + p)(l — 2p)“^}, 

where [QId = sup 3 ,gs ,^^ 2 ) is defined in Assumption 2. For p : —>• K, define i;i^i(p) = [pJl 

and for n> 2, 

(3) Si,„(p) = 1 p 1^ + , min + |5|^2-^). 

Define also E 2 ,i(p) = [Tp^li and for n>2, 

(4) S2,„(5) = iVli + min (lTpl?2^ + \yg\l,2-^). 

1 

Theorem 4. Work under Assumptions 2 and 3. Then, for every n > 1 and every p : 2) C § —>• M 
integrable w.r.t. v, the following inequalities hold true: 


(i) For any d > 0 such that S > 4i?jpjc 


1-1 


, we have 




-|G„1(52 


Xg) + K2\g\ooSJ' 


(ii) For any d > 0 such that 5 > 4i?(l — 2p) ^[ploolTn] ^ 


we have 

-|T„1(52 


i(p) + K4\g\ooS 


Theorem 5. Work under Assumptions 2 and 3. Then, for every n > 2 and for every g : T)^ C 
—>■ M such that Tg is well defined and integrable w.r.t. v, the following inequalities hold true: 

(i) For any d > 0 such that 5 > 4i?jlPpjoolG„j“^, we have 



(ii) For any d > 0 such that S > 4(ni?jCPpj 

'UGTtt, _ 1 



J Tg dv > 6^ < exp ^ 


-|G„1(52 


l^l^2,nig) + K2\g\ OO^ '' 


|ploo)|'ir„_il \ we have 


J Vg do > < exp ^ 




KlE2.n-l(p) + K21 p1 
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A few remarks are in order: 

1) Theorem 4 (i) is a direct consequence of Theorem 5 (i) but Theorem 4 (ii) is not a corollary of 
Theorem 5 (ii): we note that a slow term or order n~^ « (log |T„|)“^ comes in Theorem 5 (ii). 

2) Bitseki-Penda et al. in [11] study similar Hoeffding-type deviations inequalities for functionals 

of bifurcating Markov chains under ergodicity assumption and for uniformly bounded functions. 
In the present work and for statistical purposes, we need Bernstein-type deviations inequalities 
which require a specific treatment than cannot be obtained from a direct adaptation of [11]. In 
particular, we apply our results to multivariate wavelets test functions ipf that are well localised 
but unbounded, and a fine control of the conditional variance * = 1,2 is of crucial 

importance. 

3) Assumption 3 about the uniform geometric ergodicity is quite strong, although satisfied in the 
two examples developed in Section 4 (at the cost however of assuming that the splitting rate of 
the growth-fragmentation model has bounded support in Section 4.1). Presumably, a way to relax 
this restriction would be to require a weaker geometric ergodicity condition of the form 

|Q™g(x) — f g dh'\ < R\g\ac,V(x) , x G §, m > 0, 

Js 

for some Lyapunov function V : B ^ [l,oo). Analogous results could then be obtained via 
transportation information inequalities for bifurcating Markov chains with a similar approach as 
in Gao et al. [27], but this lies beyond the scope of the paper. 

3. Statistical estimation 

In this section, we take (§, ©) = (K, 23(11)). As in the previous section, we fix a compact interval 
DCS. The following assumption will be needed here 

Assumption 6. The family {T^x, dy dz), x G S} is dominated w.r.t. the Lebesgue measure on 
23(11^)) . We have (abusing notation slightly) 

y{x,dy dz) = y{x,y, z)dy dz for every x G B 

for some CP : S^ —> [0,oo) such that 

IJ’Id.i = / sup y{x,y, z)dydz < oo. 

Under Assumptions 2, 3 and 6 with n{dy) = dy, we have (abusing notation slightly) 

y{x,dy dz) = y{x,y, z)dy dz, Q(x,dy) = Q{x,y)dy and v[dx) = i>(x)dx. 

For some n > 1, we observe X„ = (X„)„gT„ and we aim at constructing nonparametric estimators 
of a; v(x), {x, y) Q(x, y) and (x, y, z) CP(x, y, z) for x,y,z G T>. To that end, we use regular 
wavelet bases adapted to the domain for d = 1, 2, 3. 

3.1. Atomic decompositions and wavelets. Wavelet bases (i/'a)-’' adapted to a domain in 
for d = 1,2,3 are documented in numerous textbooks, see e.g. Cohen [17]. The multi-index 
A concatenates the spatial index and the resolution level j = |A|. We set Aj = {A, |A| = j} and 
A = Uj>_iAj. Thus, for g G for some tt G (0,oo], we have 

9^ = 51 ^aV'a. with 5 a = (d. V'a)> 

j>-lAGAj AgA 
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where we have set j = — 1 in order to incorporate the low frequency part of the decomposition and 
(SjV'a) = / 9'4’x denotes the inner product in From now on, the basis is fixed. For 

s > 0 and tt S ( 0 , oo], 5 belongs to if the following norm is finite: 

(5) \\9\\B,,^m = sup ^ 

with the usual modification if tt = cx). Precise connection between this definition of Besov norm 
and more standard ones can be found in [17]. Given a basis there exists cr > 0 such that for 

TT > 1 and s < a the Besov space defined by (5) exactly matches the usual definition in terms of 
moduli of smoothness for g. The index a can be taken arbitrarily large. The additional properties 
of the wavelet basis that we need are summarized in the next assumption. 

Assumption 7. For p > 1, 

( 6 ) 

for some a > 0 and for all s < a, jo > 0, 

(7) 

j<3o AGAj 

for any subset Aq C A, 

(8) / ( ^ ^ ^ IIV’aIIlp- 

AgAo AgAo 

Ifp>l, for any sequence (wa)agA, 

AgA AgA 

The symbol ~ means inequality in both ways, up to a constant depending on p and D only. 
The property (7) reflects that our definition (5) of Besov spaces matches the definition in term 
of linear approximation. Property (9) reflects an unconditional basis property, see Kerkyacharian 
and Picard [32], De Vore et al. [21] and ( 8 ) is referred to as a superconcentration inequality, or 
Temlyakov property [32]. The formulation of ( 8 )-(9) in the context of statistical estimation is 
posterior to the original papers of Donoho and Johnstone [22, 23] and Donoho et al. [25, 24] and 
is due to Kerkyacharian and Picard [32]. The existence of compactly supported wavelet bases 
satisfying Assumption 7 is discussed in Meyer [35], see also Cohen [17]. 


3.2. Estimation of the invariant density v. Recall that we estimate x iy{x) for x G D, 
taken as a compact interval in § C M. We approximate the representation 

= E J^A = 

AgA 


by 


I'nix) = E 

|A|<,7 


iAA,n — ‘J'a,?7 ^ l^ip I E! 




with 
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and denotes the standard threshold operator (with = x for the low 

frequency part when A S A_i). Thus 9„ is specified by the maximal resolution level J and the 
threshold rj. 

Theorem 8. Work under Assumptions 2 and 3 with n(fia;) = dx. Specify Vn with 

J = log 2 1 I and r] = c^/log |T„|/|T„| 
log |T„| 

for some c > 0. For every tt G (0, oo], s > I/tt and p >1, for large enough n and c, the following 
estimate holds 

with ai{s,p, tt) = min { ^ “ constant that depends on s,p, tt, ||u||®a ^(D) j Pj 

R and |Q|d and that is continuous in its arguments. 

Two remarks are in order: 

1) The upper-rate of convergence is the classical minimax rate in density estimation. We infer that 
our estimator is nearly optimal in a minimax sense as follows from Theorem 2 in Donoho et al. 
[25] applied to the class Q{x,y)dy = v{y)dy, i.e. in the particular case when we have i.i.d. Ai„’s. 
We highlight the fact that n represents here the number of observed generations in the tree, which 
means that we observe |T„| = 2”“''^ — 1 traits. 

2) The estimator is smooth-adaptive in the following sense: for every sq > 0, 0 < po < 1/2, 
i?o > 0 and Qq > 0 , define the sets Al(so) = {(s, tt), s > sq, sq > 1 /t} and 

Q(po,Ro, Qo) = {Q such that p< po,R< i?o, |Q|d, < Qo}, 

where Q is taken among mean transitions for which Assumption 3 holds. Then, for every C > 0, 
there exists c* = c*(D,p, sq, po, Qo, C) such that specihed with c* satisfies 

\ pai(s,p,7r) 

1—iVl ) - ^llip(D)] < 00 

log 

where the supremum is taken among (u, Q) such that vQ = v with Q G Q(po, Ro, Qo) and ||u||®a oo(i>) 
< C. In particular, achieves the (near) optimal rate of convergence over Besov balls simultane¬ 
ously for all (s, tt) G Al(so). Analogous smoothness adaptive results hold for Theorems 9, 10 and 11 
below. 

3.3. Estimation of the density of the mean transition Q. In this section we estimate {x, y) ^ 
Q{x, y) for {x, y) G and D is a compact interval in § C M. In a first step, we estimate the density 

fQ{x,y) = v{x)Q.{x,y) 

of the distribution of {Xy^-,Xu) when L{X^) = u (a restriction we do not need here) by 

fn{x,y)= fx,ntpl{x,y), 

I "I ttGT* 


with 
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and is the hard-threshold estimator defined in Section 3.2 and T* = T„ \ Gq. We can now 

estimate the density Q{x,y) of the mean transition probability by 


( 10 ) 


Qn{x,y) = 


fn{x,y) 


max{9„(ai), vj} 

for some threshold w > 0. Thus the estimator Q„ is specified by J, rj and ru. Define also 
( 11 ) m{v) = inf v{x) 


where the infimum is taken among all x such that (x, y) G for some y. 


Theorem 9. Work under Assumptions 2 and 3 with n{dx) = dx. Specify Q„ with 

J =h log 2 I and y = cx/(log |T„|)2/|T„| 

for some c > 0 and zu > 0. For every tt G [1, oo], s > 2/Tr and p > 1, for large enough n and c and 
small enough w, the following estimate holds 

(12) (E[llS.-ar„,.,])'''<(15=fid!)"'-"'’, 

with a 2 {s,p,n) = min | ’ Provided m{v) > vj > Q and up to a constant that 

depends on s,p, tt, HQIIss w(i^) and that is continuous in its arguments. 


This rate is moreover (nearly) optimal: define 62 = sir — (p — tt) . We have 


inf sup fE[||Qn 



Tog |T„K “ 2 (s,P, 7 I-) 


/ iog|i»i y 
V |T„| ) 


if S2>0 
if £2 < 0 , 


where the infimum is taken among all estimators of Q based on (2f„)„gT„ O'lxd the supremum is 
taken among all Q such that ||Q||®» ^{D^) ^ ^ m(u) > C for some C, C > 0. 


3.4. Estimation of the density of the T-transition CP. In this section we estimate (x, y, z) ^ 
CP(x, y, z) for (x, y, z) G T>^ and CD is a compact interval in § C K. In a first step, we estimate the 
density 

h(x,y,z) = n{x)y{x,y,z) 

of the distribution of ( 2 f„, 2 f„i) (when dL{X^) = v) by 


with 


fn{x,y,z)= ^ fx,ni(l{x,y,z), 


fx,n ^A,rj 


( 


1 

|T„_i| 




and CrA,,,(-) is the hard-threshold estimator defined in Section 3.2. In the same way as in the 
previous section, we can next estimate the density CP of the T-transition by 


(13) 


yn{x,y,z) 


fn{x,y,z) 

max{9„(x), 137} 


for some threshold lu > 0. Thus the estimator CP„ is specified by J, 77 and m. 
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Theorem 10. Work under Assumptions 2, 3 and 6. Specify Tn with 

J =k I and r] = cx/(log |T„|)7|T„| 

for some c > 0 and zu > 0. For every tt G [1, cxd ], s > S/tt and p > 1, for large enough n and c 
and small enough w, the following estimate holds 

( 14 ) 

with 03 ( 3 , PjTt) = min { ^^ 3 ; } > V^ovided m{v) > w > 0 and up to a constant that 

depends on s,p, tt, HTUss ^(d^) m(u) and that is continuous in its arguments. 


This rate is moreover (nearly) optimal: define £3 = 7 ~ have 


, ^ Nl/p f */ £3>0 

infsup(E[||T„-J>|!7(^3)]) > / log|T„K i^3(..P..) ^ 

I I ' 

where the infimum is taken among all estimators of tP based on (X„)„gT„ Oi'nd the supremum is 
taken among all T such that ||TUsa (d3 ) < C and to(u) > C for some C, C > 0. 


4. Applications 

4.1. Estimation of the size-dependent splitting rate in a growth-fragmentation model. 

Recently, Doumic et al. [26] have studied the problem of estimating nonparametrically the size- 
dependent splitting rate in growth-fragmentation models (see e.g. the textbook of Perthame [37]). 
Stochastically, these are piecewise deterministic Marvov processes on trees that model the evolution 
of a population of cells or bacteria: to each node (or cell) u € T, we associate as trait A„ G § C 
(0, 00 ) the size at birth of the cell u. The evolution mechanism is described as follows: each cell 
grows exponentially with a common rate t > 0. A cell of size x splits into two newborn cells of size 
x/2 each (thus A„o = A„i here), with a size-dependent splitting rate B{x) for some R : § —>■ [ 0 , 00 ). 
Two newborn cells start a new life independently of each other. If 7 denotes the lifetime of the 
cell u, we thus have 

(15) P(C„ G [t,t + dt)\(u > t,Xu = x) = B(xexp{Tt))dt 
and 

(16) Xu = ^Xu- exp(rC„-) 

so that (15) and (16) entirely determine the evolution of the population. We are interested in 
estimating x B{x) for x G D where 2) C § is a given compact interval. The process (A„)„gT is 
a bifurcating Markov chain with state space § and T-transition any version of 

yB{x,dydz) = P(X„o G dy, Xui G dz\Xu- = x). 

Moreover, using (15) and (16), (see for instance the derivation of Equation (11) in [26]), it is not 
difficult to check that 

Tb(x, dydz) = Qb{x, dy) ® Sy{dz) 
where 6y denotes the Dirac mass at y and 

(17) Qb{x, dy) = exp J ^ l{y> 3 :/ 2 }C^ 2 /- 
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If we assume moreover that x B[x) is continuous, then we have Assumption 2 with Q = Qb 
and n{dx) = dx. 


Now, let § be a bounded and open interval in (0,oo) such that supS > 2inf§. Pick r S S, 
0 < L < T log 2 and introduce the function class 


psupS n/ \ rr 

G(r, L) = < B : S —)■ [0, oo), / - -dx = oo, / 

^ J X JinfS 


B{x) 


dx < L 


}■ 


By Theorem 1.3 in Hairer and Mattingly [29] and the explicit representation (17) for Qb, one can 
check that for every B € G(r,L), we have Assumption 3 with Q = Qb- In particular, we comply 
with the stringent requirement p = pb ^ C{r,L) for some C{r,L) < 1/2, i.e. uniformly over 
C(r, L). Finally, we know by Proposition 2 in Doumic et al. [26] - see in particular Equation (24) 
- that 


B{x) 


Tx vb{xI2) 

2 fJf^2^B(z)dz’ 


where i^b denotes the unique invariant probability of the transition Q = Qb ■ This yields a strategy 
for estimating x B{x) via an estimator of a; vb{x)- For a given compact interval 2) C §, 
define 


(18) 


Bn{x) 


TX _v»(x/2)_ 

^ (|TTf SiiGT„ l{a:/2<Xu<x}) V W 


where is the wavelet thresholding estimator given in Section 3.2 specified by a maximal resolution 
level J and a threshold rj and vu > 0. As a consequence of Theorem 8 we obtain the following 


Theorem 11. Specify Bn with 

J =\ log2 d = c\/log|T„l/lT„l 

for some c > 0. For every B G C(r,L), s > 0 , 7 r G (0, oo] and p > 1, large enough n and c and 
small enough w, the following estimate holds 

with ai(s,p,7r) = nrin { 2 s+i ? }> '^P “ constant that depends on s,p, tt, ^( 23 ), r 

and L and that is continuous in its arguments. 


This rate is moreover (nearly) optimal: define £1 = sir — |(p — tt). We have 

, ^ Ni/P f ^f ei>0 

intsup (e[||B„ - ^ 

I n I 

where the infimum is taken among all estimators of B based on {Xn)ueT-n supremum is 

taken among all B G G{r, L) such that jjiJjjBs (jj) < C. 

Two remarks are in order: 


1) We improve on the results of Doumic et al. [26] in two directions: we have smoothness- 
adaptation (in the sense described in Remark 2) after Theorem 8 in Section 3 for several loss 
functions over various Besov smoothness classes, while [26] constructs a non-adapative estimator 
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for Holder smoothness in squared-error loss; moreover, we prove that the obtained rate is (nearly) 
optimal in a minimax sense. 

2) We unfortunately need to work under the quite stringent restriction that § is bounded in order 
to obtain the uniform ergodicity Assumption 3, see Remark 3) after Theorem 5 in Section 2. 


4.2. Bifurcating autoregressive process. Bifurcating autoregressive processes (BAR), first in¬ 
troduced by Cowan and Staudte [16], are yet another stochastic model for understanding cell 
division. The trait A„ may represent the growth rate of a bacteria m G T in a population of 
Escherichia Coli but other choices are obviously possible. Contrary to the growth-fragmentation 
model of Section 4.1 the trait (A„o, A„i) of the two newborn cells differ and are linked through 
the autoregressive dynamics 


(19) 

initiated with A 0 and where 


A^o — /b(Au) -1- u'o(A^)c^O; 

^ui — /*i(A^) -f (Ti (A.,^)c^i, 


/o, /i : K —>■ K and do, di : K —)■ (0, 00 ) 

are functions and (euo,euJuGT are i.i.d. noise variables with common density function G : —>■ 

[0,oo) that specify the model. 


The process (A„)„gT is a bifurcating Markov chain with state space § = K and T-transition 

( 20 ) y{x,dydz) = G{ao{x)~^{y - fo{x)),(Ti{x)~^{z - fi{x))^dydz. 

This model can be seen as an adaptation of nonlinear autoregressive model when the data have a 
binary tree structure. The original BAR process in [16] is defined for linear link functions /o and 
fi with /o = fi Several extensions have been studied from a parametric point of view, see e.g. 
Basawa and Huggins [2, 3] and Basawa and Zhou [4, 5]. More recently, de Saporta et al. [8, 19] 
introduces asymmetry and take into account missing data while Blandin [14], Bercu and Blandin 
[7], and de Saporta et al. [20] study an extension with random coefficients. Bitseki-Penda and 
Djellout [10] prove deviation inequalities and moderate deviations for estimators of parameters in 
linear BAR processes. From a nonparametric point of view, we mention the applications of [12] 
(Section 4) where deviations inequalities are derived for the Nadaraya-Watson type estimators of 
/o and fi with constant and known functions CTo Ui). A detailed nonparametric study of these 
estimators is carried out in Bitseki Penda and Olivier [13]. 


We focus here on the nonparametric estimation of the characteristics of the tagged-branch chain 
V and Q and on the T-transition T, based on the observation of (A„)„gT,i for some n > 1. Such an 
approach can be helpful for the subsequent study of goodness-of-fit tests for instance, when one 
needs to assess whether the data (A„)„gT are generated by a model of the form (19) or not. 


We set Go{x) = fg G{x,y)dy and Gi{y) = /g G{x,y)dx for the marginals of G, and define, for 
any M > 0, 


S{M) =min{ inf Go(x), inf Gi(x)} 

\x\<M IXI < M 


Assumption 12. For some £ > 0 and (t> 0, we have 

max{ sup |/o(x)|,sup |/i(x)|} < £ < 00 

X X 
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and 

min { inf ao{x), inf cri(a:)} > o; > 0 . 

Moreover, Gq and Gi are bounded and there exists n > 0 and M > tlq_ such that S{{iJ, + £)/g^ > 0 
and 2{Mg_ - e)5{M) > 1/2. 

Using that Go and Gi are bounded, and (20), we readily check that Assumption 6 is satished. 
We also have Assumption 2 with n{dx) = dx and 

= \ {GQ{y - fo{x))+ Gi{y - fi{x))^, 

Assumption 12 implies Assumption 3 as well, as follows from an straightfroward adaptation of 
Lemma 25 in Bitseki Penda and Olivier [13]. Denoting by v the invariant probability of Q we also 
have m{v) > 0 with m{v) defined by (11), for every T) C see the proof of Lemma 24 in 

[13]. As a consequence, the results stated in Theorems 8 , 9 and 10 of Section 3 carry over to the 
setting of BAR processes satisfying Assumption 12. We thus readily obtain smoothness-adaptive 
estimators estimators for v, Q and T in this context and these results are new. 

4.3. Numerical illustration. We focus on the growth-fragmentation model and reconstruct its 
size-dependent splitting rate. We consider a perturbation of the baseline splitting rate B{x) = 
x/(5 — x) over the range x G § = (0, 5) of the form 

B{x)=B{x) + cT{V{x-l)) 

with (c, j) = (3,1) or (c, j) = (9,4), and where T{x) = (1 -f x)l{_i< 2 ,<o} + (1 - a;)l{o<a:<i} is a 
tent shaped function. Thus the trial splitting rate with parameter (c,j) = (9,4) is more localized 
around 7/2 and higher than the one associated with parameter (c,/) = (3,1). One can easily check 
that both B and B belong to the class C{r,L) for an appropriate choice of {r,L). For a given B, 
we simulate M = 100 Monte Carlo trees up to the generation n = 15. To do so, we draw the size 
at birth of the initial cell A 0 uniformly in the interval [1.25, 2.25], we fix the growth rate r = 2 and 
given a size at birth = x, we pick A„o according to the density y ^ QB{x,y) defined by (17) 
using a rejection sampling algorithm (with proposition density y ^ Qgix, y)) and set Xui = Xuo- 
Figure 1 illustrates quantitatively how fast the decorrelation on the tree occurs. 


Computational aspects of statistical estimation using wavelets can be found in Hardle et at, 
Chapter 12 of [30]. We implement the estimator defined by (18) using the Matlab wavelet tool¬ 
box. We take a wavelet filter corresponding to compactly supported Daubechies wavelets of order 
8 . As specified in Theorem 11, the maximal resolution level J is chosen as ^ log 2 (lT„j/log jT„j) 
and we threshold the coefficients VA.n which are too small by hard thresholding. We choose the 
threshold proportional to ^log lT„j/jT„j (and we calibrate the constant to 10 or 15 for respectively 
the two trial splitting rates, mainly by visual inspection). We evaluate Bn on a regular grid of 
D = [1.5,4.8] with mesh Ax = (jTjij)“^/^. For each sample we compute the empirical error 


\W -B\\a. 
\\B\\a. ’ 




where jj • j|Aa: denotes the discrete L^-norm over the numerical sampling and sum up the results 
through the mean-empirical error e = M~^ 'YIaLi together with the empirical standard devia¬ 
tion (M-i Y.f=i{G - 
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S 0.4- 
< 


i 0.2 


0 4 6 6 8 8 8 8 10 10 10 

Lag 


Figure 1. Sample autocorrelation of ordered { Xuo,u G (G„_i) for n = 15. Note: 
due to the binary tree structure the lags are {4,6, 6,...}. ^45 expected, we observe 
a fast decorrelation. 


Table 1 displays the numerical results we obtained, also giving the compression rate (columns %) 
defined as the number of wavelet coefficients put to zero divided by the total number of coefficients. 
We choose an oracle error as benchmark: the oracle estimator is computed by picking the best 
resolution level J* with no coefficient thresholded. We also display the results when constructing 
Bn with Gn (instead of T„), in which case an analog of Theorem 11 holds. For the large spike, the 
thresholding estimator behaves quite well compared to the oracle for a large spike and achieves the 
same performance for a high spike. 






n = 12 




n = 15 




Oracle 

Threshold est. 

Oracle 

Threshold est. 



Mean 

(sd.) 

J* 

Mean 

(sd.) 

% 

Mean 

(sd.) 

J* 

Mean 

(sd.) 

% 

Large 

spike 

T„ 

0.0677 

(0.0159) 

5 

0.1020 

(0.0196) 

96.6 

0.0324 

(0.0055) 

6 

0.0502 

(0.0055) 

97.1 

Gn 

0.0933 

(0.0202) 

5 

0.1454 

(0.0267) 

97.9 

0.0453 

(0.0081) 

6 

0.0728 

(0.0097) 

96.7 

High 

spike 

T„ 

0.1343 

(0.0180) 

7 

0.1281 

(0.0163) 

97.4 

0.0586 

(0.0059) 

8 

0.0596 

(0.0060) 

97.7 

Gn 

0.1556 

(0.0222) 

7 

0.1676 

(0.0228) 

97.7 

0.0787 

(0.0079) 

8 

0.0847 

(0.0087) 

97.9 


Table 1 . Mean empirical relative error e and its standard deviation, with respect 
to n, for the trial splitting rate B specified by {c,j) = (3,1) (large spike) or 
(Cjj) = (4,9) (high spike) reconstructed over the interval T) = [1.5,4.8] by the 
estimator i?„. Note: for n = 15, ||T„| = 32 767 and ||G„| = 16 384; for n = 12, 
4|T„| = 4 095 and 4|G„| = 2 048. 
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Figure 2 and Figure 3 show the reconstruction of the size-dependent splitting rate B and 
the invariant measure ub in the two cases (large or high spike) for one typical sample of size 
^|T„| = 32 767. In both cases, the spike is well reconstructed and so are the discontinuities in the 
derivative of B. As expected, the spike being localized around | for B, we detect it around | for 
the invariant measure of the sizes at birth vb- The large spike concentrates approximately 50% of 
the mass of i^b whereas the large only concentrates 20% of the mass of i'b- 




2 2.2 2.4 2.6 


Figure 2. Large spike: reconstruction of the trial splitting rate B specified by 
{c,j) = (3,1) over D = [1.5,4.8] and reconstruction of vb over V 12 based on one 
sample (Ar„, u € T„) for n = 15 (i.e. IjT^I = 32 767 ). 




Figure 3. High spike : reconstruction of the trial splitting rate B specified by 
(Cjj) = (9j4) over T> = [1.5,4.8] and reconstruction of vb over T)/2 based on one 
sample (A„, u € T„) for n = 15 (i.e. |lT„j = 32 767 ). 


5. Proofs 


5.1. Proof of Theorem 4(i). Let g : § ^ M. such that jgji < oo. Set v{g) = j^g{x)v{dx) and 
g = g — v{g). Let n >2. By the usual Chernoff bound argument, for every A > 0, we have 




g{Xu) > (5) < exp ( - A1G„15)E 


exp (A ^ 


9{Xu)) 


( 21 ) 
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Step 1. We have 
E 


exp (a E 5(^.)) 


‘^n-l 


= E 


liGGn— 1 


1 


(^'^(ff(Xuo) + ff(Xul))^ 

Fn-1 

(^^(ff(Xuo) + ff(X„i))^ 

Fn-1 


thanks to the conditional independence of the {Xuo, Xui)ueG„-i given as follows from Defi¬ 

nition 1. We rewrite this last term as 

E exp +g(^ui) - 2Q5(Xtj))^ I exp (A2Qg(X„)), 

inserting the 3^„_i-measurable random variable 2Q'g{Xu) for u S Moreover, the bifurcating 

struture of (X„)ugT implies 

( 22 ) 

since Q = + IPi)- We will also need the following bound, proof of which is delayed until 

Appendix 


]E[g(Ar„o) + 3(Ai„i) — 2Qg(A„)|lf'„_i] —0, m G Gn-i, 


Lemma 13. Work under Assumptions 2 and 3. For all r = 0,... ,n—l and u € Gn-r-i, we have 
\2^{Q^g{Xuo) + - 2Q’'+ig(X„))| < cilgU 


and 


E[(2'-(Q’'g(A„o) + Q^giX^i) - 2Q’-+i5(A„)))"| < c^alig) 

with Cl = 4max {l -f Rp, i?(l F p)'\, C 2 = 4max{|Q|D, 4|Q||,, 4i?^(l -|- p)^} and 


(23) 


<^ri9) = 


l5l 


r = 0, 


min{|p|f22'-,|p|^(2p)2’'} r = 1. 

(Recall that |Q|d = sup,j,gs_j,g 23 Q{x,y) and R, p are defined via Assumption 3.) 
In view of (22) and Lemma 13 for r = 0, we plan to use the bound 

AV^ 


(24) 


E 


[exp(AZ)] < exp ^ 


2(1 - AM/3) 


valid for any A G (0, 3/M), any random variable Z such that jZj < M, E[Z] = 0 and E[Z^] < cr^. 
Thus, for any A € (0,3/ci|p|oo) and any u G <G„_i, with Z = g(Xuo) +g(Xt,i) - 2Qg(Xu), we 
obtain 


E exp (^A(g(Xuo) + g(Xui) - 2Qp(A„)) 
It follows that 


Fn-l 


(25) E exp (A E 5(^«)) 

liGGn. 


Fn-l 


< 




A^C2cr^(p)|<G„_i 

i)J 

UGGn—1 

Step 2. We iterate the procedure in Step 1. Conditioning with respect to T„_ 2 , we need to control 


E 


exp (A2Qp(A„)) 






-1 
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and more generally, for 1 < r < n — 1: 


^n—r—l 

(A2’'(Q"g(X„o) + Q^g(X^i) - 2Q’'+i5(X„))) 


e[ Yl exp{\2^Q^g{X^)) 

UG.Gn — r 

= n e[ exp ( A2’'(Q"g(X„o) + Q^giX^i) - 2Q^+^giXu)) ) 3^n-r-i 

UG.Gn — r—1 

xexp(A2"+iQ"+i5(X,)), 

the last equality being obtained thanks to the conditional independence of the {Xy^Q, ^«i)ugg„_,._i 
given Jn-r-i- We plan to use (24) again: for u S Gn-r-i, we have 

E[2-{Q^g{Xyo) + Q"5(W„i) - 2Q-+^giXy))\jy_r.-i] = 0 

and the conditional variance given can be controlled using Lemma 13. Using recursively 

(24), for r = 1,..., n — 1, 


E 


X'^C2a'^ig)\Gn-r-l\ 


exp (A2Q5(X„)) 3^0 < n ( 2h^-X 

uGG„_i r=l 1 


- Aci|g|oo/3) 


exp (A2"Q"g(X0)) 


for A G (0,3/ci|g|oo)- By Assumption 3, 

exp (A2”Q”g(X0)) < exp(A2”i?(2|5|oo)p") < exp(A2i?|g|oo) 
since p < 1/2. In conclusion 

'X^C2EZl^n9)\Gn-r-l\ 

iAp yA^^yyv\uJ) ^ \ 

U^Gn—l 


E 


exp (A2Q5r(X„))j < exp exp(A2i?|5|oo). 

^ ^ 2(1-Aci|gU/3) ) 


Step 3. Let 1 < .^ < n — 1. By definition of cr/{g) - recall (23) - and using the fact that {2p)^'~ < 1, 
since moreover |G„_r.-i| = 2"“’’“^, we successively obtain 

^ 77 ,_ 

^a,^(ff)2"-’'-i<2"-i(|5|?y]2^ + |ff|^ 5] 2-^{2pr) 

r—1 r—1 r—£+1 

<2-{\g\l2^ + \g\l2-^) 

< \Gn\(l>nig) 

for an appropriate choice of i, with (/nig) = mini<f<„_i {\g\\2^ + 151^2“^). It follows that 

A^C2|G„|^„(5) 


(26) 


E 


exp (A2Qg(X„)) 

"UGGtx — 1 


< exp 


/ A"C2fUn| 

V2(I - Acil 


X2R\gl 


(I - Aci|g|oo/3) 

Step 4- Putting together the estimates (25) and (26) and coming back to (21), we obtain 

u 5(^-) > ^) < «p ( - AiG„ii++ A2flH, 


liGGn 


with Ei,„(g) = \g \2 + (/nig) for n > 2 and Ei,i((/) = a^ig) = |g||. Since <5 is such that 2R\g\ao < 
|G„|(5/2, we obtain 
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The admissible choice A = 5/{^’^8ci\g\ao + 2c2Si,n(5')) yields the result. 


5.2. Proof of Theorem 4 (ii). Step 1. Similarly to (21), we plan to use 
(27) ^ < exp ( - A|T„|5)E exp (A 9 {^n)) 


•uGT^ 


iiGT„ 


for a specific choice of A > 0. We first need to control 


E[exp(A ^ g(X„))|T„_i] = exp (Ag(Af„))E exp (A ^ 5(2f„)) 
Using (25) to control E[exp (A^^^gg^ ff(Aftj)) |T„_i], we obtain 


Jn-l 


E 


exp (a E 5(^«)) 




^n-1 


n exp{A2Ds(X„)) H «P(A9(V)). 


uG.Gn— 1 


uGTn-1 


Step 2. We iterate the procedure. At the second step, conditioning w.r.t. T„_ 2 , we need to control 


E 


exp {Xg{Xu)) exp (Ag(X„) + 2AQg(X„)) 

— "UGGn —1 


‘Jn-2 


and more generally, at the (r + l)-th step (for l<r<n — l),we need to control 


E 


n {^9{Xu)) exp (A E 2'"Q'"g(X„)) 

lA^Tn-r-l uGGn-r TTL — O 


^n. — r — 1 


r+1 


= n exp(A 5 (X„)) n exp (A E 

uGTn-r-2 U^Gn — r-l 771 — 0 

X E[ exp (a Tr(Xu, ATtio, A'^i)) , 


where we set 


T,(X„,X„o,X„i) = E 2™(Q'"5(^.o) + Q"^9(^ui) - 2Q"‘+^g(X^)). 

m—0 

This representation successively follows from the T„_r_i-measurability of the random variable 
nuGT„_p_i (^9(^n)), the identity 

Yl exp(F(X„))= Yl exp(U(X„o) + P(X,i)), 

"UGGn —r 'IAGGtx T 1 


the independence of {X^o, Xui)uGGr^-r—i conditional on 3^n-r-i and finally the introduction of the 
term2^(;^^o2™Q™+i5(X„). 

We have, for u G G„_r_i 

E[Tj.(X„, Ar„o, A'ui)|T„_j.-i] =0, 
and we prove in Appendix the following bound 
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Lemma 14. For any r = 1,..., n — 1, u € Gn-r-i, we have 

\'^r{Xu, XuO, Xul)\ < C3I5I00 


and 

ElTriXu, XuO, Xul)‘^\3'n-r-l] < 04^^ ( 5 ) < CX) 
where C 3 = 4i?(l + p)(l — 2p)~^, C 4 = 12 max {|Q|d, 16|Q||,,4i?^(l + p)^(l — 2p)“^} and 
(28) (7^{g) = \g\l + min + |g|^( 2 p) 2 ^ 1 {^>^}). 

(Recall that |Q|d = sup^^gs.yGD o.nd R, p are defined via Assumption 3.) 

In the same way as for Step 2 in the proof of Theorem 4(i), we apply recursively (24) for 
r = 1 ,..., n — 1 to obtain 


n —1 

E [ exp (A E g{Xu))\%] < n exp 

uGTrz r—0 


/ C4A^cr^(g)|G„-r-i| \ 

V 2(1 - c'3A|g|oo/3) ) 


exp(A^2-Q-g(X0)), 

m—0 


if A € ( 0 , S/cgIgloo) with Cg = maxjci, cg} = 4max{l + Rp, i?(l + p)(l — 2 p)“^} and cro(ff) = Iff 1 2 
in order to include Step 1 (we use C 4 > C 2 as well). Now, by Assumption 3, this last term can be 
bounded by 

n 

exp (A ^ 2”"(i?|g|ooP™)) < exp (A2i?(l - 2p)"^|g|oo) 

m—0 

since p < 1/2. Since |G„_r.-i| = 2"“’’“^, by definition of <j(.{g) - recall (28) - for any 1 < ^ < n— 1 
and using moreover that {2pY < 1 , we obtain 

n—1 

'y (ff) l^^n-r-l I 

r—0 

, n—1 I n—1 n—1 

<2-^(\g\lJ22-^ + \g\l[j22^^2-^+ + |g|L E 2-'' 

A T-=0 r=l r=l+\ r=l+\ 

< |T„|Ei,„(g), 


where Si_„(g) is defined in (3). Thus 


E 


exp (A ^ g(A:„))j < exp 

tiGT„ 


C4A^|T„|Si„(g) 


C3A|g|oo/3) 


A2i?(l-2p)-^|g| 


Step 3. Coming back to (27), for 5 > 0 such that 2R{1 — 2p) ^|g|oo < |'r„|i5/2, we obtain 

•■(e ? 4 ^) 4 (- ait„4+. 


U^Tri 


We conclude in the same way as in Step 4 of the proof of Theorem 4 (i). 
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5.3. Proof of Theorem 5 (i). The strategy of proof is similar as for Theorem 4. Let g : —)■ K 
such that |g|i < oo and setg = g — Let n > 2 (if n = 1, set E 2 ,i(ff) = |Q(lP5)|oo)- Introduce 

the notation = (A„, for simplicity. For every A > 0, the usual Chernoff bound reads 


(29) 


(|^ X! 9i^u) >s^< exp(-A|G„|(5)E exp (A ^ g(A„)) 






Step 1. We first need to control 


E[exp(A ^ p(Atj)) T„_i] =E 

n exp (^A(5 (A„o)+5(A„i))) 


uGGtt, 




E exp (^A( 5 (A„o)+ 5 (A„i))) 

Ju-l 


liGOTl— 1 


using the conditional independence of the (A^o, A„i) for u G <G„_i given T„_i. Inserting the term 
2Q(1P^(X„), this last quantity ia also equal to 


n E 

U^Gn-l 


exp (A(g(A,o) + 5 (A„i) - 2 Q(T 5 )(A„))) exp (A 2 Q(T 5 )(A„)). 


For u G <Gn-i we successively have 

+ 5(Aiii) — 2Q(CPg)(A„)|5'„_i] = 0, 

|5 (A„o) + 5 (A„i) - 2Q(T5)(X„)| < 4(1 + Rp)\gU 

and 

E[(g(A,o) +5 (A„i) - 2Q(7g){Xu)f\'^n-i] < 4|Qb|T5'|i, 
with |Qb = supjjgs Q(a:, y) and R, p defined via Assumption 3. The first equality is obtained by 

conditioning first on then on Tn-i- The last two estimates are obtained in the same line as the 

proof of Lemma 13 for r = 0, using in particular Q{yg'^){x) = Jg ^g^(y)Q(x, y)n(dy) < |Q|d|T5^|i 
since Tg^ vanishes outside B. 

Finally, thanks to (24) with Z = g(Auo) + g(A„i) — 2Q(Tg)(A„), we infer 
(30) E[exp(A ^ g(A„))|T„_i] ^ ( 2(1 _ 0)|g|^/l^) 11 exp (A2Q(Tg)(Ab) 


for A e (0,3/(4(l + i?p)|5|oo)). 


Step 2.. We wish to control E[n«GG„_i exp (A2Q(1P5()(-^«))] • We are back to Step 2 and Step 3 of 
the proof of Theorem 4 (i), replacing 'g by Tg, which satisfies i^{Tg) = 0. Equation (26) entails 


(31) 


E 


exp (A2Q(Tg)(A„)) 

■UGGti — 1 


/ Ab2|(G„b„(yg) 

-®''^V2(l-Aci|Tg|oo/3) 


A2i?|T(7|oo) 


with (j)n{yg) = mini<f<„_i (|Tg|i2^ + \‘?g\lo2 and ci = 4max{l + Rp,R{l + p)}, C 2 = 

4max{|Q|D,4|Qb,4i?2(l + p)^}. 


Step 3. Putting together (30) and (31), we obtain 


(32) 


E 


exp (A E 5(A«))j < exp 




A^C2|<G„|E2,n(5) 

2(l-Aci|5|oo/3) 


+ \2R\yg\^ 
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with T, 2 ^n{ 9 ) = + (f’ni'yg) and using moreover |g|oo > \yg\ao and Ci > 4(1 + Rp). Back to 

(29), since 2R\yg\ao < |G„|5/2 we finally infer 

C2\Gin\^2,n{g) 


E 9{^^) - > <5) < exp ( - A|G„|^ + ^ 


- Aci|g|oo/3) 


We conclude in the same way as in Step 4 of the proof of Theorem 4 (i). 

5.4. Proof of Theorem 5 (ii). In the same way as before, for every A > 0, 

(33) E 5(A„)><5) <e-^l’'--I^E[exp(A ^ g{A^)) 

' «GT„_i «GT„_i 

Introduce S 2 Q(g) = |Tg^|i and 

S'2,J5) = iVli + inf + |T5lL2-'l{£<n-i}), for n > I. 

It is not difficult to check that (32) is still valid when replacing A, 2 ,n by S 2 „. We plan to successively 
expand the sum over the whole tree T„_i into sums over each generation Gm for m = 0,..., n — I, 
apply Holder inequality, apply inequality (32) repeatedly (with E 2 m) together with the bound 

n—1 

y ] |Gm|E2 „j(5') < |T„_1 |E2,n-l (<?)■ 

m—0 

We thus obtain 

n—1 

E[exp (A E 5(A„))] =E[ n exp (A 


neTn-1 


m—0 


uGGrr 


n — 1 


< 


< 


(^E[exp {nXg{A(i,))] n E[exp (nX E ?(*«))] 


m—1 uGG' 

n-1 /„a\2 


1/n 


< exp 


f f \ 0 \ \ \ TJ *^2|^^m|E2 aAodIT) I 

( exp (nA2|g|oo) [[ exp (-y-- —^ + inX)2R\yg\^) 1 

^ ™Ei 2(1 - (nA)ci|g|oo/3) / 

A2c2n|T„_i|E2,„-i(g) 

' 2A(ni?|lPg|oo + Isle 


1/n 


2(1 - Ci(nA)|g|oo/3) 

Coming back to (33) and using 2(Ri?|lPg|oo + I 5 I 00 ) < |'ir„_i|5/2, we obtain 

m,/ 1 ~/A \ f MTT 1^ I A^C2R|Tji_i|E2,„_l(g) 

^ d S “P ( - "IT-1 2 + 2(l-(„A)o.|9l„/3) 

We conclude in the same way as in Step 4 of the proof of Theorem 4 (i). 

5.5. Proof of Theorem 8. Put c(n) = (log |T„|/|T„|)^/^ and note that the maximal resolution 
J = Jn is such that 2"^" ^ c(n)“^. Theorem 8 is a consequence of the general theory of wavelet 
threshold estimators, see Kerkyacharian and Picard [32]. We first claim that the following moment 
bounds and moderate deviation inequalities hold: for every p > 1, 

(34) E[|i?A,n - < c{n)P for every |A| < J„ 

and 

(35) P(|PA,« - i"a| > p>cc{n)) < c{nfP for every |A| < 











ADAPTIVE ESTIMATION FOR BIFURCATING MARKOV CHAINS 


23 


provided xr > 0 is large enough, see Condition (37) below. In turn, we have Conditions (5.1) 
and (5.2) of Theorem 5.1 of [32] with A„ = J„ (with the notation of [32]). By Corollary 5.1 and 
Theorem 6.1 of [32] we obtain Theorem 8. 


It remains to prove (34) and (35). We plan to apply Theorem 4 (ii) with g = ip\ and 6 = 6n = 
pxcin). First, we have for p = 1, 2, oo by (6), so one readily checks that for 

|i?(l-2p)-iCoo(log|T„|)-i, 

the condition (5„ > 4i?(l — 2p)“^|'0]^|oo|'ir„|“^ is satisfied, and this is always true for large enough 
n. Furthermore, since 2l^l < 2'^" < c(n)~^ it is not difficult to check that 

(36) = IV'ili + min + \ipi\l^2-^) < C 

KKn—l 


for some C > 0 and thus = C' say. Also KilipWaoSn < KiCao2^^^^'^c{n)p>i < 

C'px, where C" > 0 does not depend on n since 21^1^^ < c(n)“^. Theorem 4 (ii) yields 


|uA,n - ua| > pxcc{n)) < 2 exp ^ - 


C + C"px 


< c(n) 


2p 


for X such that 


(37) > \C” + 

and large enough n. Thus (35) is proved. Straightforward computations show that (34) follows 
using E[|i7A_„ — ua|^] = /p°° pmP“^P(|ua,„ — v\\ > v)jdu and (35) again. The proof of Theorem 8 is 
complete. 


5.6. Preparation for the proof of Theorem 9. For /i : —>■ K, define |ft.|oo,i = 


For n> 2, set also 

(38) E 3 .„(/r) = \h\l + min (|/i|?2^ + \h\l^,2-^). 


sup \h{x,y)\dy. 


Is xes 


Recall that under Assumption 3 with n{dx) = dx, we set fQ(x,y) = v{x)Q{x,y). Before proving 
Theorem 9, we first need the following preliminary estimate 


Lemma 15. Work under Assumption 2 with n{dx) = dx and Assumption 3. Let h : —>■ K 6e 

such that |1i/q|i < oo. For every n > 1 and for any S > 4|/i|oo(Rn + l)|ir*[“^, we have 


( 1 ^ > < 5 ) < exp 




uGT* 


K^T^S^nih) + n2\h\ 00^ 


where T* = T„ \ {0} and K 5 = max{|Q|D, |Q|^}«;i(Q, CD). 


Proof. We plan to apply Theorem 5 (ii) to g{x, xo,xi) = | {h{x, xq) + h{x, xi)). Since Q 
CPi) we readily have yg{x) = h{x,y)Q{x,y)dy. Moreover, in that case, 


1 

|T„_i| 


^ ^ yi.^u j AC^o 1 ^ui ) 



^ h{X^-,Xu) 

uGT* 


5(lPo + 


and fg yg(x)iy(x)dx = h(x,y)Q(x,y)iy(x)dxdy = (/i,/ q). We then simply need to estimate 

^ 2 ,n(g) defined by (4). It is not difficult to check that the following estimates hold 

\y9\i<\Q\l\h\l \y9\l<\Q\l\h\l^i and |V|i < |Q|d|/i|2 
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since {7g^){x) < J^h{x,y)'^Q{x,y)dy. Thus ^ 2 ,n{g) < inax{|Q|D, |Q|||}E 3 _„(/i) and the result 
follows. □ 


5.7. Proof of Theorem 9, upper bound. Step 1. We proceed as for Theorem 8. Putting c(n) = 
(nlog |T* |/|T* 1)^/^ and noting that the maximal resolution J = J„ is such that ^ c{n)~^ 
with d = 2, we only have to prove that for every p > 1, 

(39) E[|7a,„ - /a|^] < c{n)P for every |A| < J„ 
and 

(40) Pd/A.n - /a| > p>cc{n)) < c{nfP for every |A| < J„. 

We plan to apply Lemma 15 with h{x, y) = y) = v) S = Sn = p>cc{n). With the 
notation used in the proof of Theorem 8 one readily checks that for 

2p)-^C^{Rn + l)(log |T*|)-i 

the condition > 4|'0||oo(i?n + 1)|T* is satisfied, and this is always true for large enough n 
and 


(41) 4(l-2p)-iCoo(2i?+l). 

Furthermore, since IV'aIp ^ for p = l,2,oo and 2'^!'''! < 2'^'^" < c(n) ^ we can 

easily check 

, min + \A\IoA-') < C 

Kt<n— 1 


for some C > 0, and thus = C" say. Also, K 2 \’tpf\aoA < K2Coo2‘^l^l/^c(n)p>f < 


C'px, where C" does not depend on n. Applying Lemma 15, we derive 


]P(I/A,n - /a| > p>cc{n)) < 2exp - 


n dTr„_i|p^>ir^c(n) 
C" + C"p>c 


< c{nA 


as soon as x satisfies (41) and (37) (with appropriate changes for C and C"). Thus (40) is proved 
and (39) follows likewise. By [32] (Corollary 5.1 and Theorem 6.1), we obtain 


(42) 


E([ll7n 



nlog |T„|^^“2(s.p.7I•) 
|T„| ) 


as soon as ||/q||b» is finite, as follows from fQ{x, y) = Q{x, y)v{x) and the fact that ||i^||b» ^{d) 

is finite too. The last statement can be readily seen from the representation ^{x) = fg b(p)Q(p, x)dy 
and the definition of Besov spaces in terms of moduli of continuity, see e.g. Meyer [35] or Hardle 
et al. [30], using moreover that tt > 1. 


Step 2. Since Q{x,y) = fQ{x,y)/v{x) and Q„(ai,p) = fn{x,y)/ max{9„(a;),zu}, we readily have 

|Q„(a;,p) - Q{x,y)\P < ^(|7„(a;,y) - foix^yA + I max{9„(a;),tu} - v{xA), 

where the supremum for /q can be restricted over D^. Since m{v) > w, we have | max{T„(x), vj} — 
b(x)| < |T„(x) — v{x)\ for X G T>, therefore 




1 


(Il7n-/Q|| 


P 

LP(D2) 


l/algc 

m(v)P 


11^ 


p \ 
LP{D)) 


holds as well. We conclude by applying successively the estimate (42) and Theorem 8. 
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5.8. Proof of Theorem 9, lower bound. We only give a brief sketch: the proof follows classical 
lower bounds techniques, bounding appropriate statistical distances along hypercubes, see [25, 30] 
and more specifically [15, 31, 34] for specific techniques involving Markov chains. We separate the 
so-called dense and sparse case. 

The dense case £2 > 0. Let —)■ K a family of (compactly supported) wavelets adapted 

to the domain D and satisfying Assumption 7. For j such that consider the 

family 

'^e,jix,y) = \V^\~^lD2{x,y) + exi^lix^y) 

\eAj 

where e S { — 1,1}^^ and 7 > 0 is a tuning parameter (independent of n). Since |i/’||oo < C'oo2l'^l = 
Coo 2 -^ and since the number of overlapping terms in the sum is bounded (by some fixed integer 
N), we have 

7 |T„|-i/ 2 | ^ exi:l{x,y)\ < < 7 . 

This term can be made smaller than [2)^1“^ by picking 7 sufficiently small. Hence Q^j{x,y) > 0 
and since J ipx = 0, the family Q^j(x, y) are all admissible mean transitions with common invariant 
measure u^dx) = lx>{x)dx and belong to a common ball in 23® ^^(D^). For X G Aj, define Tx ■ 
{—1,1}^J —)■ {—1, Ijl^^l by Tx{€x) = —ex and T\(e^) = if /r 7 ^ A. The lower bound in the dense 
case is then a consequence of the following inequality 

(43) limsup max ||P” ■ - PJ ( ^ -[[rv < 1, 

where P"j- is the law of (A„)„gT„ specified by the T-transition Tej = Qej 0 Qej and the initial 
condition £(^ 0 ) = u. 


We briefly show how to obtain (43). By Pinsker’s inequality, it is sufficient to prove that 
E” ■ [log —] can be made arbitrarily small uniformly in n (but fixed). We have 


E" 


-log- 


dP: 




dP" 


= - E K, 




log 


= - E 


(e) ,j i^u f 7 ^ul ) 

^u07 

^Tx(e)J 7 -^u) 


= -in+ii 


/D 2 


log( 


) Xu) 

QTUe),jix,y) 


'^e,j{x,y) 


Qe,jix,y)v{dx)dy 


< in 


n+ll 




1,2 V Qe,jix,y) 


using — log(l + z) < z'^ — z valid for z > —1/2 and the fact that n{dx) is an invariant measure for 
both QTxie),j and Q^j- Noting that 

^Tx{e),j{x,y) = Qe,jix,y) - cxiplix, y), 


^Tx{e),j{x,y) 

^e,j{x,y) 


^ 27 |T„|-V^Coo 2 ^' 

- l-7[T„[-i/2iVCoo2^- 


<7inn/' 


we derive 
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hence the squared term within the integral is of order 7^|Tyj| ^ so that, by picking 7 sufficiently 


small, our claim about [log 


dr: 


t’aCO.j ■ 


is proved and (43) follows. 


The sparse case €2 < 0. We now consider the family 
Qx,j{x,y) = 

with e\ S {—1,+1} and A £ Aj, with j such that ^ The lower bound 

then follows from the representation 

dP? ■ 

log ^ = 11; -‘-A log 2^ 

where P" ^ and P" denote the law of (X„)„gT„ specified by the T-transitions Q\j 0 Qxj and iy0iy 
respectively (and the initial condition ^(Xq,) = v); the w’s are such that sup„ max;^gA„ wa < !> 
and U" are random variables such that P7(1 ^a — ~^i) > C2 > 0 for some Ci,C 2 > 0. We omit 
the details, see e.g. [15, 31, 34]. 


5.9. Proof of Theorem 10. 


Proof of Theorem 10, upper bound. We closely follow Theorem 9 with c(n) = (nlog |T„_i|/|Tji_i|)^/^ 
and J = Jn such that 2'^'^" ~ c(n)“^ with d = 3 now. With 6 = Sn = p>cc{n), for k > 
71 - 2p)-iC'oo(2i? + 1), we have (5„ > A\'ipl\^{Rn + 1)|T*|-P 


Furthermore, since IV^aIp — for p = 1,2, 00 and 2'^l'''l < 2^^*^" < c(n) ^ it is not 

difficult to check that 


S2,n(t/’A) < max{|T|i,,i|Q|i,, |T||,_i}Ei,„(V'a) < C 

thanks to Assumption 6 and (36), and thus kiY, 2 ,n{g) < = C. We also have K2\'4'f\oo5n < 

K2Coo2l^l‘^/2^( n)pK < C'px, where C does not depend on n. Noting that fx = (/^Va) = 


J T'lf'^dv, we apply Theorem 5 (ii) to g = ij^x and derive 


]P(I/A,n - /aI > P>cc{n)) < 2exp - 


n i|T„i|p^>ir^c(n) 


C + C'pxt 

for every |A| < Jn as soon as k is large enough and the estimate 

T/p , /nlog |T„K “3(s.p, it) 


^ < c(n)^^ 


E 


([iiA-wiw.j)'’’<(77) 


follows thanks to the theory of [32] . The end of the proof follows Step 2 of the proof of Theorem 9 
line by line, substituting /g by /y. □ 


Proof of Theorem 10, lower bound. This is a slight modification of the proof of Theorem 9, lower 
bound. For the dense case £3 > 0, we consider an hypercube of the form 

= |®^rUi,3(a;,j/,z)+7|T„|"1/2 ^ ex^l{x,y,z) 

AgAj 

where e £ {~1) 1}^^ with j such that and 7 > 0 a tuning parameter, while 

for the sparse case £3 < 0, we consider the family 

‘?x,j{x,y,z) = \‘D^\~'^lD3{x,y,z)+'y{^-^f^Y^‘^exiplix,y,z) 
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with e\ e { — 1, +1}, A G Aj, and j such that ^ ^ 7(s+3(i/2 i/’r))^ proof then goes 

along a classical line. □ 

5.10. Proof of Theorem 11. 


Proof of Theorem 11, upper bound. Set Vnix) = l{x/ 2 <A„<a;} and v,,{x) = ^Biy)dy. 

By Propositions 2 and 4 in Doumic et al. [26], one can easily check that sup^jg^j vb(x) < oo and 
iiifxGD x^j^x) > 0 with some uniformity in B by Lemma 2 and 3 in [26]. For x € T>, we have 


\Bn{x) - B[x)\^ '^:^\'^n{x) - vb{x)\^ + 


suPa,gi,UB(a:)^ | 

mt,„(zDV„{x)P 


max{i;„(a:), vj} — Vu{x) I 


< 


Vn{x) - vb{x)?' + |un(a;) - '!;,.(ai)| 


By Theorem 4 (ii) with g = ^{x/2<.<x}^ one readily checks 

noo 

E[\Vnix) - Vj,{x)\^] = / pU^~'^F{\Vn{x) - Vjy{x)\ > u)du < \Tn\~^^‘^ 

Jo 

and this term is negligible. Finally, it suffices to note that 1 |ub1|®s ( 23 ) is finite as soon as 

11^11®= (D) is finite. This follows from 


f B(^x^ r 

vb{x) = / VB{y)QB{y,x)dy = - / UB(y)exp(- / 

Js TX Jo Jy 

We conclude by applying Theorem 8. 


B{2z) 


y/2 


TZ 


dzpy. 


□ 


Proof of Theorem 11, lower bound. This is again a slight modification of the proof of Theorem 9, 
lower bound. For the dense case ei > 0, we consider an hypercube of the form 

B,j{x) = Boix) + 7lT„l"i/2 ekifiix) 

A 

where e G { — 1,1}^^ with j such that and 7 > 0 a tuning parameter. By 

picking Bq and 7 in an appropriate way, we have that Bq and B^j belong to a common ball in 
23® ^(D) and also belong to G{r,L). The associated T-transition Ts. j defined in (17) admits as 
mean transition 

^ , , , B, A2y) ( fy B, A2z) , N 

Qs. 2 {x, dy) = —-—— exp ( - / —- -dz]l{y>xi 2 }dy 

Xy 3 TZ / 

which has a unique invariant measure vb, ^. Establishing (43) is similar to the proof of Theorem 9, 
lower bound, using the explicit representation for ^ with a slight modification due to the fact 
that the invariant measures vb^ j and ^ do not necessarily coincide. We omit the details. 

For the sparse case ei < 0, we consider the family 

B\Ax) = Bo{x)+j{^-^f^y^^ex'ifi{x) 

with e\ G { —1,+1}, A G Aj, with j such that ^ The proof is then 

similar. □ 
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6 . Appendix 

6.1. Proof of Lemma 13. The case r = 0. By Assumption 3, 

+ gi^ul) — 2Qg(Ar„)| < 2(|g|oo + R\g\oaP) < 4(1 + Rp)\g\aa- 
This proves the first estimate in the case r = 0. For u G G„_i, 

E[(g(A„o) +5(^.1) - 2Qg(A„)) Vn-i] 

= E[((7(X,o) + g(X^i) - 2Q5(A„)) Vn-i] 

< E[(g(A,o) +5(X„i))Vn-i] < 2(Toff"(A:„) +Tig2(A„)) = 4Qg^(X^) 
and for a: G §, by Assumption 2, 

Qg^(x) = J^giyfQ{x,y)n{dy) < \Q\q 3 \g\l 
since g vanishes outside D. Thus 

(44) E[(g(A„o) +g(X„i) - 2Qg(A„)) Vn-i] < 4|Qb|5l2 
hence the result for r = 0. 

The case r > 1. On the one hand, by Assumption 3, 

|2bQ’'5(^.o) + Q^g{Xui) - 2Q’'+ig(A„))| <2^{2R\g\^{p^ + p-+^)) 

(45) <4i?(l + p)|gU(2p)A 
On the other hand, since 

\Qg{x)\ < \g{y)\Q{x,y)n{dy) < |Q|d|5|i, 

we also have 

2’'|Q’'5(X,o) + Q"5(A„i) - 2Q^+b(A„)| =2"|Qb(X«o) + Qb(^^i) - 2Q"+b(^u)| 

(46) <2'-4|Qb|ff|i. 

Putting together these two estimates yields the result for the case r > 1. 

6.2. Proof of Lemma 14. By Assumption 3, 

r 

|TbX„, A„o, A„i)| < 2 ^ 2^R\g\^p^{l+p) < 4i?|5|oo(l + p)(l - 2p)-^ 

771—0 

since p < 1/2. This proves the first bound. For the second bound we balance the estimates (45) 
and (46) obtained in the proof of Lemma 13. Let £ > 1. For u G Gn-r-i, we have 

|TbA:„,A,o,A,i)| </ + // + ///, 

with 

^ = \g{^uo) + 5 (^ 111 ) ~ Qg{^u)\, 

iAr 

II =Y^ 2™|Q'"5(A„o) + Q™5(^ui) - 2Q^+^g{X^)l 

m—1 

r 

III= Y. 2™|Q™5(A„o) + Q™5(A„i)-2Q”^+ig(Xb|, 

7?i=Mr+l 
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with III = 0 a £ > r. For u G Gn-r-i, by (44), we successively have 

E[/2|J„_,_i] <4|Q|D|g|2, 

£Ar 

//<4|Qbl5|i^2’"<8|Q|D|5|i2^^’- 

m—1 

by (46), while for £ < r, 

r 

///<4i?(l + p)|5U ^ {2pr<AR{l + p){l-2p)-^\gU2pY+^ 
by (45). The result follows. 
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